Statistical Models for Prediction, Chap.4

4.2 Binary Outcomes

Julia Romanowska

2024-12-19

Binary outcomes

  • model: logistic regression

\[ \mathrm{logit}(p(y = 1)) = a + b_i \cdot x_i \]

  • estimation: ML, penalized ML
  • interpretation: coefficients relate to 1 unit difference in \(x_i\)

\(R^2\) in logistic regression

better models have a wider spread in predictions

Fig.4.4

\(R^2\) on log-likelihood scale

\[ LL = \sum y \cdot \log(p) + (1-y) \cdot \log(1-p) \]

  • perfect model: \(LL = 0\)
  • usually: \(LL < 0\) and deviance: \(-2LL > 0\)
  • comparing with null model = likelihood ratio:

\[ LL_0 = \sum y \cdot \log(\mathrm{mean}(y)) + (1-y) \cdot \log(1 - \mathrm{mean}(y)) LR = -2(LL_0 - LL_1) \]

\(R^2\) on log-likelihood scale

\[ R^2 = \left( 1- e^{-LR} \right) / \left( 1 - e^{-2LL_0} \right) \]

Fig.4.6

Other \(R^2\)

ref from the book: “What’s the Best R-Squared for Logistic Regression?”, Paul Allison, 2013

# function from {performance} R package
?r2

Naïve Bayes

Bayes rule

  • prior probability of disease: \(p(D)\)
  • posterior probability of disease: \(p(D|x)\)
  • diagnostic likelihood ratio for symptom \(x\):

\[ LR(x) = \frac{p(x|D)}{p(x|!D)} \]

\[ \mathrm{Odds}(D|x) = \frac{p(D))}{(1-p(D))} \cdot LR(x) \\ logit(D|x) = logit(D) + log(LR(x)) \]

  • similar to univariate logistic model!

Prediction with Naïve Bayes

  • prediction of symptoms’ combination:
    post-\(x_1\) is prior for \(x_2\), post-\(x_2\) is prior for \(x_3\), etc.
    • only for conditionally-independent variables!
    • might give very good discrimination!
    • applied for effects of genetic markers
  • simple correction for correlated predictors:
    add calibration slope to the model

\[ \mathrm{logit}(y) = \alpha + \beta_{cal} \cdot lp_u \]

Machine learning

Neural networks

  • GAM
  • generalized nonlinear models: NN (neural networks)
  • input layer - hidden layer(s) - output layer
  • iterative learning
  • penalization not to “overtrain”

Tree models

  • classification and regression tree (CART) aka recursive partitioning
  • splitting of patients based on cut-off:
    • maximum separation between subgroups
    • minimum variability within subgroup
  • many trees: random forest

Tree models

advantages and disadvantages

advantages:

  • simple presentation
  • interaction effects incorporated

disadvantages

  • all continuous variables categorized
  • cut-offs lead to overfitting
  • interactions between all predictors
  • some predictors are only in certain branches
  • poor prediction performance of tree models

Other methods

  • multivariate additive regression splines (MARS)
  • support vector machine (SVM)
    • regression and classification